An Overview of Haplotyping via Perfect Phylogeny: Theory, Algorithms and Programs
نویسنده
چکیده
The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A key, perhaps bottleneck, problem is to computationally determine haplotype pairs from genotype data. An approach to this problem based on viewing it in the context of perfect phylogeny was introduced in 2002 along with an efficient solution. A slower (in worst case) variation of that method was later implemented. Two simpler methods for the perfect phylogeny approach that are also slower (in worst case) than the first algorithm were later developed. We have implemented and tested all three of these approaches in order to compare and explain the practical efficiencies of the three methods. In this talk I will introduce the haplotyping problem and the three algorithms for its solution that have been implemented. I will discuss two empirical observations: a strong phase-transition in the frequency of obtaining a unique solution as a function of the number of individuals in the input; and results of using the method to find nonoverlapping intervals where the haplotyping solution is highly reliable, as a function of the level of recombination in the data. Finally, I will discuss the biological basis for the size of these tests.
منابع مشابه
Perfect Path Phylogeny Haplotyping with Missing Data Is Fixed-Parameter Tractable
Haplotyping via perfect phylogeny is a method for retrieving haplotypes from genotypes. Fast algorithms are known for computing perfect phylogenies from complete and error-free input instances—these instances can be organized as a genotype matrix whose rows are the genotypes and whose columns are the single nucleotide polymorphisms under consideration. Unfortunately, in the more realistic setti...
متن کاملEfficient Computation of Template Matrices
The computation of template matrices is the bottleneck of simple algorithms for perfect phylogeny haplotyping and for perfect phylogeny under mutation and constrained recombination. The fastest algorithms known so far compute them in O(nm) time. In this paper, we describe an algorithm for computing template matrices in O(nm/ log(n)) time. We also present and discuss a conjecture that implies an...
متن کاملHaplotyping with missing data via perfect path phylogenies
Computational methods for inferring haplotype information from genotype data are used in studying the association between genomic variation and medical condition. Recently, Gusfield proposed a haplotype inference method that is based on perfect phylogeny principles. A fundamental problem arises when one tries to apply this approach in the presence of missing genotype data, which is common in pr...
متن کاملComputational Complexity of Perfect-Phylogeny-Related Haplotyping Problems
Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. This problem, which has strong practical applications, can be approached using both statistical as well as combinatorial methods. While the most direct combinatorial approach, maximum parsimony, leads to NP-complete problems, the perfect phylogeny model proposed by Gusfi...
متن کامل1 Haplotype Inference
Fresh Pond Research Institute 1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Introduction to Variation, SNPs, Genotypes, and Haplotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 The Biological Problem • The Computational Problems • The Need for a Genetic Model • Two Major Ap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003